Public Transportation Accessibility in Twin Cities
Author
Tina Chen, Shirley Jiang, Cynthia Zhang
Published
April 16, 2024
source('cleaning.R')
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats 1.0.0 ✔ readr 2.1.4
✔ ggplot2 3.4.4 ✔ stringr 1.5.0
✔ lubridate 1.9.2 ✔ tibble 3.2.1
✔ purrr 1.0.2 ✔ tidyr 1.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
Attaching package: 'Hmisc'
The following objects are masked from 'package:dplyr':
src, summarize
The following objects are masked from 'package:base':
format.pval, units
Attaching package: 'janitor'
The following objects are masked from 'package:stats':
chisq.test, fisher.test
Reading layer `school_program_locations' from data source
`/Users/apple/Documents/GitHub/212/STAT212_final_project/data/shp_struc_school_program_locs'
using driver `ESRI Shapefile'
Simple feature collection with 5872 features and 24 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 190534 ymin: 4817361 xmax: 747765 ymax: 5468774
Projected CRS: NAD83 / UTM zone 15N
Reading layer `PostsecondaryEnrollment' from data source
`/Users/apple/Documents/GitHub/212/STAT212_final_project/data/shp_society_post_second_enroll'
using driver `ESRI Shapefile'
Simple feature collection with 212 features and 73 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -97.01679 ymin: 42.49901 xmax: -87.69674 ymax: 48.59
Geodetic CRS: NAD83
Getting data from the 2018-2022 5-year ACS
Warning: • You have not set a Census API key. Users without a key are limited to 500
queries per day and may experience performance limitations.
ℹ For best results, get a Census API key at
http://api.census.gov/data/key_signup.html and then supply the key to the
`census_api_key()` function to use it throughout your tidycensus session.
This warning is displayed once per session.
Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
`summarise()` has grouped output by 'tract', 'population', 'medianIncome',
'White Alone', 'Black or African American Alone', 'Asian Alone', 'American
Indian and Alaska Native Alone'. You can override using the `.groups` argument.
`summarise()` has grouped output by 'tract'. You can override using the
`.groups` argument.
`summarise()` has grouped output by 'nYear'. You can override using the
`.groups` argument.
Mapping sthe Education Sources in Twin Cities
schools_stops %>%ggplot(aes(x = number_school, y = number_of_stops))+stat_summary_bin(fun ="mean", bins =5, geom ="point")+#divides the number of schools by the number of bins, so Bin 1: 0-10 schools, Bin 2: 10:20, etc. geom_smooth(method ="lm")+labs(x ="Mean Number of Schools (Pre and Post-Secondary) in Tract", y ="Mean Number of Bus Stops in Tract")+# avg # of stops in areas with similar # of schoolstheme_classic()
`geom_smooth()` using formula = 'y ~ x'
schools_stops %>%ggplot(aes(x = number_school, y = number_of_stops))+geom_point(alpha =0.5)+geom_smooth(method ="lm")+labs(x ="Number of Schools (Pre and Post-Secondary) in Tract", y ="Number of Bus Stops in Tract")+theme_classic()
`geom_smooth()` using formula = 'y ~ x'
function
map
# need help: don't know how to add title with var?census_var_plot <-function(var, title){# using chosen variable to make the map with bus stopsggplot()+geom_sf(data = census2023, aes(fill = {{var}}))+geom_sf(data = stops_sf_county, color ="green", size =0.1)+scale_fill_gradient(low ="orange", high ="blue")+labs(fill="", title =str_to_title(title))}census_var_perc_plot <-function(var, title){# using the percentage of each race to make the map with bus stopsggplot()+geom_sf(data = census2023, aes(fill = {{var}}/population))+geom_sf(data = stops_sf_county, color ="green", size =0.1)+scale_fill_gradient(low ="orange", high ="blue")+labs(fill="", title =str_c("Percentage of ",title))}for(i in1:2){print(census_var_plot(get(name[i]), name[i]))}
census_point_plot <-function(var, title){# Function for plotting the linear relationship between chosen variables and the number of bus stops stops_census_join_summ %>%ggplot(aes(x = {{var}}, y = number_of_stops))+geom_point()+geom_smooth(method ="lm")+labs(y ="Number of Bus Stops in Tract",title =str_to_title(title))+theme_classic()}census_point_density_plot <-function(var, title){# Function for plotting the linear relationship between# percentage of each ethnicity group and the number of bus stops stops_census_join_summ %>%ggplot(aes(x = {{var}}/population, y = number_of_stops))+geom_point()+geom_smooth(method ="lm")+labs(y ="Number of Bus Stops in Tract", title =str_c("Percentage of ",title))+theme_classic()}for(i in1:2){print(census_point_plot(get(name[i]), name[i]))}
# total ridership over 5 yearsused_ridership_all %>%group_by(nYear) %>%mutate(Total_Riders =sum(Total_Riders,na.rm =TRUE)) %>%ggplot(aes(x=nYear, y=Total_Riders))+geom_point()+geom_line()+labs(x ="Year", y ="Total Riders")
# 2 comparisons included: weekday vs weekend, and among each yearweekends_weekdays %>%ggplot(aes(x= nYear, y = Total_Riders)) +geom_point()+geom_line()+facet_wrap(~Schedule)+labs(x ="Year", y ="Total Riders")
used_ridership_all %>%filter(!(rte_class %in%c("CommRai","SuburbL","Support"))) %>%group_by(nYear, rte_class) %>%mutate(Total_Riders =sum(Total_Riders,na.rm =TRUE)) %>%ggplot(aes(x=nYear, y = Total_Riders)) +geom_point()+geom_line()+facet_wrap(~rte_class)+labs(x ="Year", y ="Total Riders")
# total ridership for BRT increase because increasing linesused_ridership_all%>%filter(rte_class =="BRT") %>%distinct(nYear,Route) %>%group_by(nYear) %>%summarise(num_BRT =n()) %>%ggplot(aes(x =nYear, y = num_BRT))+geom_point()+geom_line() # the increase of BRT can replace one of the orginal line
outlier <-rep(0, cleaned_stops_census%>%nrow() )outlier[as.numeric(names(influential)) ] <-1outlier_info <- cleaned_stops_census %>%cbind(outlier = outlier) %>%filter(outlier ==1)outlier_info_sf <-st_as_sf(outlier_info)# We see that 22 tract have a Cook’s Distance greater than 3x the mean.# how many sd from the linear regression because of unit differenceggplot()+geom_sf(data = census2023)+geom_sf(data = outlier_info_sf, fill ="red")+geom_sf(data = stops_sf_county, color ="green", size =0.1, alpha =0.3)
# geom_text(data = outlier_info_sf, aes(label = tract, x = outlier_info_sf[["geometry"]][["X"]], y = outlier_info_sf[["geometry"]][["Y"]]), color = "black", size = 3)
Correlation Graph of All Variables:
# correlation <- function(df, var) {# cor_value <- cor(df[[var]], df$number_of_stops, use = "complete.obs")# return(cor_value)# }correlation_values <-vector("list", length =7)columns_to_iterate <-c("population", "median_income", "white_alone", "black_or_african_american_alone", "asian_alone", "american_indian_and_alaska_native_alone", "native_hawaiian_and_other_pacific_islander_alone")for (col_name in columns_to_iterate) {# calculate correlation values cor_value <-cor(cleaned_stops_census[[col_name]], cleaned_stops_census$number_of_stops, use ="complete.obs")# confidence intervals mod_formula_str <-str_c("number_of_stops", "~", col_name) mod_form <-as.formula(mod_formula_str) mod <-lm(mod_form, data = cleaned_stops_census) ci <-confint(mod, level =0.95)# table of data correlation_values[[col_name]] <-tibble(variable = col_name,correlation = cor_value,ci_lower = ci[variable, "2.5 %"],ci_upper = ci[variable, "97.5 %"],slope = mod$coefficients[variable] )}correlation_dataframe <-bind_rows(correlation_values)head(correlation_dataframe)
correlation_dataframe %>%ggplot(aes(x = variable, y = slope))+geom_errorbar(aes(ymin = ci_lower, ymax = ci_upper), width =0.2)+geom_point(size =2)+labs(x ="Strength of Different Variables in Relation to Number of Stops")+theme_minimal()
Identifying the Outliers:
# demographics as per capitaoutliers_demographic_pc <- cleaned_stops_census %>%mutate(white_pc = white_alone/population, black_pc = black_or_african_american_alone/population,asian_pc = asian_alone/population,american_indian_and_alaska_native_pc = american_indian_and_alaska_native_alone/population,native_hawaiian_other_pi_pc = native_hawaiian_and_other_pacific_islander_alone/population)# The stop outliers that have more than 100 total number of bus stops are:outliers_demographic_pc %>%filter(number_of_stops >100) %>%arrange(desc(number_of_stops))
Tract 1261.02 is located near the U of M campus in Minneapolis which supports the hypothesis that there would be more stops (greater access to public transportation) for individuals who likely use it more - students. It also has the largest population size out of our narrowed-down list. This would also likely be a viable area for professionals to reside away from downtown Minneapolis but still in commuting distance to their jobs.
Tract 1261.02
Tract 265.14 is located in Plymouth, MN a large suburb near the Twin Cities. That explains their relatively high median income levels. This tract is also mainly where the bus stops are located to get around the suburb, explaining the high level of stops.
Tract 265.14
Tract 1044 is located in central Minneapolis which also makes intuitive sense why it is in the top three for number of stops. There is often more accessibility located in the downtown areas of a city to provide ongoers an easy way to move around. There is likely less population because downtowns tend to be where businesses are located, so many workers probably commute to work from other locations.
Tract 1044
Tract 251 is one the largest tract sizes which might explain partly why it has so many bus stops. However, another reason might be because Mall of America is located within this tract which prompts there to be more public transportation around this area to get to the Mall as it is a major destination within the Twin Cities.
> Tract 342.01 is downtown St. Paul. Similar to the city cases located above, downtown St. Paul would have a characteristically higher number of bus stops to allow citygoers a more convenient way to get downtown or move around downtown. Once again, less people tend to live downtown but rather in the areas around.
Tract 342.01
Tract 216.02 is in Golden Valley, MN. Golden Valley is a first-ring suburb of Minneapolis meaning that there is still public transportation access to provide the residents to commute into the city as they so please. They would need more stops in order to fully supply the residents their trips into the city.